Survey of Word Co-occurrence Measures for Collocation Detection

نویسنده

Olga Kolesnikova

چکیده

This paper presents a detailed survey of word co-occurrence measures used in natural language processing. Word co-occurrence information is vital for accurate computational text treatment, it is important to distinguish words which can combine freely with other words from other words whose preferences to generate phrases are restricted. The latter words together with their typical co-occurring companions are called collocations. To detect collocations, many word cooccurrence measures, also called association measures, are used to determine a high degree of cohesion between words in collocations as opposed to a low degree of cohesion in free word combinations. We describe such association measures grouping them in classes depending on approaches and mathematical models used to formalize word co-occurrence.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparison of Co-occurrence and Similarity Measures as Simulations of Context

Observations of word co-occurrences and similarity computations are often used as a straightforward way to represent the global contexts of words and achieve a simulation of semantic word similarity for applications such as word or document clustering and collocation extraction. Despite the simplicity of the underlying model, it is necessary to select a proper significance, a similarity measure...

متن کامل

Measuring the Compositionality of Collocations via Word Co-occurrence Vectors: Shared Task System Description

A description of a system for measuring the compositionality of collocations within the framework of the shared task of the Distributional Semantics and Compositionality workshop (DISCo 2011) is presented. The system exploits the intuition that a highly compositional collocation would tend to have a considerable semantic overlap with its constituents (headword and modifier) whereas a collocatio...

متن کامل

Measuring syntagmatic Fixedness of Multi-Word Expressions

Syntagmatic fixedness is an important feature of multi-word expressions (MWE). However, syntagmatic fixedness is gradual and various semantic and syntactic relations hold among the parts of MWEs. This poses intriguing problems for lexicography, linguistic description and language processing. In this paper we propose a computationally inexpensive and intuitive approach to the measurement of synt...

متن کامل

Extracting Bilingual Collocations from Non-Aligned Parallel Corpora

This paper proposes a new method to find correspondences of uninterrupted collocations from Japanese-English bilingual corpora without sentence-to-sentence alignment. Uninterrupted collocations in English such as “once again”, “give up”, or “gross national product” handled as a single word or a compound word in Japanese, can be automatically extracted with corresponding Japanese words using wor...

متن کامل

Ngram2vec: Learning Improved Word Representations from Ngram Co-occurrence Statistics

The existing word representation methods mostly limit their information source to word co-occurrence statistics. In this paper, we introduce ngrams into four representation methods: SGNS, GloVe, PPMI matrix, and its SVD factorization. Comprehensive experiments are conducted on word analogy and similarity tasks. The results show that improved word representations are learned from ngram cooccurre...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Computación y Sistemas

دوره 20 شماره

صفحات -

تاریخ انتشار 2016

Survey of Word Co-occurrence Measures for Collocation Detection

نویسنده

چکیده

منابع مشابه

A Comparison of Co-occurrence and Similarity Measures as Simulations of Context

Measuring the Compositionality of Collocations via Word Co-occurrence Vectors: Shared Task System Description

Measuring syntagmatic Fixedness of Multi-Word Expressions

Extracting Bilingual Collocations from Non-Aligned Parallel Corpora

Ngram2vec: Learning Improved Word Representations from Ngram Co-occurrence Statistics

عنوان ژورنال:

اشتراک گذاری